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ABSTRACT 



Recent research in the area of database machines has 
been directed at achieving greater efficiency and increasing 
user-friendliness. This thesis is concerned with the second 
of these research directions, increasing user-friendliness. 
One development toward increased user-friendliness is the 
growing acceptance of the relational data model and rela- 
tional query languages. Relational interfaces provide the 
user with an easy-to-understand data representation and 
language with which to manipulate the data. 

This thesis presents the design and analysis of a rela- 
tional query language interface, using the SQL relational 
query language, for the Multi-Backend Database System 
(MDBS) , a database machine which uses the attribute-based 
model. The purpose is two-fold: first, to provide the user 
with an eas ier-to-understand language-to-machine interface, 
thereby making MDBS available to the wider community of 
relational database users; second, to investigate how the 
attribute-based model may be used to support relational 
da tabases . 
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I. INTRODUCTION 



The rapid growth in the use of database management sys- 
tems (DBMSs) has stimulated research to produce more effi- 
cient and easier-to-use systems. Greater efficiency is re- 
quired in order to offset the inherent costs of operating a 
DBMS. One area of research directed at producing greater 
efficiency is in the development of database machines. Da- 
tabase machines use specially configured hardware, tailor- 
made software, and innovative techniques such as massive 
parallelism to support higher capacity and performance. 
Greater ease of use is necessary in order to ensure a wider 
distribution of use. The emergence of database systems us- 
ing the relational model of data is an important development 
in this area. 

One of the database machines of interest is the Multi- 
Backend Database System (MDBS). The idea of MDBS is to use 
general-purpose hardware and special-purpose software in a 
novel configuration to provide a backend database machine 
solution. The design and development of MDBS is an ongoing 
project [Ref. 1 and 2]. In this thesis, we will not examine 
the particular database machine solution to the efficiency 
problem. Rather the contribution of this thesis to the MDBS 
research is in the area of ease of use. We will determine 
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how the relational query language of SQL [Ref. 3] can be 
supported by the attribute-based query language of 'MDBS 
[Ref. 1]. In the next two sections, a brief review of the 
design goals of MDBS and of the development of relational 
query languages is presented. In the final section of the 
chapter, the organization of the thesis is discussed. 



A. DESIGN GOALS FOR EFFICIENCY IN MDBS 

As previously mentioned, research in database machines 
has been driven by the need to develop more efficient sys- 
tems. Efforts have resulted in a variety of machines which 
include: CASSM [Ref. 4 and 5], RAP [Ref. 6], DBC [Ref. 7 and 
8], DIRECT [Ref. 9], MDBS [Ref. 1 and 2], RDBM [Ref. 10], 
VERSO [Ref. 11], DBMAC [Ref. 12], and I DM [Ref. 13]. This 
is not a complete listing, but does illustrate the fact that 
no "best" architecture has been developed. Each of the 
machines listed are unique. This uniqueness makes classifi- 
cation impossible. However, while no true taxonomy of data- 
base machines exists, Strawser [Ref. 14] cites several 
design issues that can be used to categorize the systems. 
Three of these issues, processor structure, interconnection 
of the processor and the database store, and alternative 
physical organizations have particular relevance to the MDBS 
design. Within each of the issues there exist tradeoffs 
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What 



that affect the performance of the various machines, 
follows is a brief description of these three design issues, 
and the MDBS solutions. 

Many database machines are organized with a single con- 
trol processor and one or more slave processors. As in any 
such system, the control processor is a potential 

bottleneck. Some designs seek to alleviate this problem by 
having the control processor perform only administrative 
tasks, or by otherwise limiting its responsibility. At the 
other end of the spectrum, some machines permit the control 
processor to participate in query execution. Irrespective 
of the design chosen, throughput will be inversely propor- 
tional to the amount of work levied on the control proces- 
sor. A goal of the MDBS design is to minimize the poten- 
tial control processor bottleneck. The control processor 
performs a minimal set of functions, only those which are 
necessary to administer query execution. 

Additional differentiation of processor structures can 
be made between homogeneous and heterogeneous multiprocessor 
organizations ♦ Homogeneous organizations use a number of 
processors with identical functionality. This allows for a 
high degree of intra - query parallelism . The heterogeneous 
organization is characterized by a number of processors with 
specialized functionality, thus permitting inter - query 
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parallelism . MDBS uses a homogeneous multiprocessor organi- 



zation, offering a high degree of intra-query parallelism. 
The software in the backend processors is identical, allow- 
ing easy expansion of the system by replicating the software 
when new backend processors are added. The backend proces- 
sors operate in parallel. However, the backends also 
operate independently. Each backend has a separate schedul- 
ing mechanism, to make the optimum use of resources. Com- 
munication between the processors is via a broadcast bus, to 
minimize communication overhead. 

There are two major categories to describe the intercon- 
nection of the processor and the database store. The first, 
direct interconnection , connects the processor directly to 
the database store. While this method has an advantage in 
that the processors never have to wait for data, it suffers 
in two respects. The processor must be able to work at 
speeds equivalent to the transfer rate of the secondary 
storage devices, and data sharing among processors is 
severely limited. The second major category is the 
hierarchical interconnection . This method, which is more 
prevalent, transfers data from the database store to RAM 
storage for access by the processors. In MDBS, each backend 
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processor has dedicated disk drives, eliminating contention 
between processors for the same device. Data is staged from 
the disk to buffers in the main memory. 

Like other design issues, the motivation for seeking al- 
ternative physical organizations is to enhance performance. 
Two such designs are the data pool organization used in 
DBMAC [Ref. 12], and the V-Relation scheme used in VERSO 
[Ref. 11]. However, the gains realized from these organiza- 
tions apply only to some operations. MDBS uses a cluster- 
ing methodology to organize the database. Records in the 
database are divided into clusters based on attribute 
values. The clusters of the database are then spread across 
the backends, so that the advantages of parallelism are 
realized for all operations. In other words, for database 
access, MDBS attempts to achieve 
record-serial-cluster-parallel operations . 



B. RELATIONAL QUERY LANGUAGE 
Each successive generatio 
sought to make data manipu 
idea is to remove from the 
knowing the particularities 
representations of databases, 
and then the network model. 



DEVELOPMENTS 

of database languages has 
ation more user-friendly. The 
user any responsibility for 
of system structure. Early 
first the hierarchical model 
require the user to understand 
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the organization 
through it for 
of the user data 
to present the 
representation o 
retrieval and 
manipulation . 



of the database in order to navigate 
the purpose of storage, retrieval and update 
. The relational database approach attempts 
user with an easy-to-understand tabular 
f the stored data which makes the storage, 
update operations as simple as table 



Codd [Ref. 15] first proposed tuple relational calculus 
as a benchmark for evaluating data manipulation languages 
based on a relational model. The mathematical concept 
underlying the relational model is the set theoretic rela- 
tion, which is a subset of the Cartesian product of a list 
of domains. A relation is any subset of the Cartesian pro- 
duct of one or more domains. Conceptually, a relation can 
be viewed as a simple, two-dimensional table that has 
several properties. First, the entries in the table are 
single-valued; neither repeating groups nor arrays are al- 
lowed. Secondly, the entries in any column are all of the 
same kind, that is each column has a domain of values that 
can appear in the column. Each column has a unique name and 
the order of the columns is immaterial. In the relational 
model columns are referred to as attributes . 

The advantages inherent in the relational model are that 
no artificial constructs such as sets or pointers are 
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required, and that the data is represented in tabular (rela- 
tional) form in a way that is familiar to the user. Opera- 
tions on the data are specified logically or symbolically by 
relational algebra or calculus. This is of major importance 
in that while the data structure is predefined, the record 
relationships are not defined until they are used. Conse- 
quently, any relationship that can be expressed in relation- 
al algebra or calculus can be used. Among the advantages 
cited for relational DBMSs is increased productivity in ap- 
plications development, due to the simplicity and 
flexibility of the model and the relational query languages. 

The importance of the relational model in regards to 
this paper is not in its implementation, but rather the log- 
ical representation it offers to the user. This representa- 
tion is developed through the use of relational query 
languages like SQL. SQL, earlier called SEQUEL, was first 
introduced by Chamberlin [Ref. 16] to be used with the 
relational model. It was another attempt to provide the 
user with an English-like language with which he could 
construct and manipulate his database. Developments and 
changes to the language grew out of IBM System R research 
[Ref. 3]. 
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As pointed out by Hsiao [Ref. 17] the relational model 
suffers in its lack of solutions to the problems of database 
transformation and query translation. Conversely, any rela- 
tional database may be transformed, in a straightforward 
way, into the attribute-based database used by MDBS. There- 
fore it is practical to think in terms of a relational data- 
base implemented on MDBS. Developing a relational query 
language interface to MDBS has several advantages. First, 
we provide an easy-to-use interface which afford the user 
the productivity increase claimed for relational query 
languages. Second, by choosing to implement the interface 
for SQL, the most widely used relational query language, we 
provide homogeneity for a wide community of database system 
users. Third, we identify those areas in MDBS where 
enhancements must be made in order to provide a full 
relational language capability. 

C. ORGANIZATION OF THE THESIS 

In Chapter 2 an overview of the organization of the 
multi-backend system is presented. Chapter 3 describes the 
MDBS query language. The SQL query language is described in 
Chapter 4. Chapter 5 explains the mappings from SQL to the 
MDBS query language. Chapter 6 offers recommendations for 
implementation, and Chapter 7 summarizes the conclusions 
obtained from the research experience. 
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II. ORGANIZATION OF THE MULTI-BACKEND SYSTEM 



An understanding of the organization of the multi- 
backend database system is helpful in understanding some of 
the design considerations of the MDBS query language. Fig- 
ure 1 is a representation of the MDBS hardware organization. 
The system is comprised of a controller and a number of 
backends, all general-purpose minicomputers. A broadcast 
bus connects the controller and the backends. Each backend 
has a dedicated number of disk drives. 

The major design goals of MDBS are to allow the database 
to grow and the rate of requests to increase while maintain- 
ing good overall performance. To obtain these goals the 
multi-backend database system should have the following 
properties : 

(1) Throughput improvement is proportional to the 
multiplicity of backends; 

(2) Response time is inversely proportional to the 
multiplicity of backends; 

(3) The system is extensible for future growth 
and/or performance improvement; 

These properties are obtained through various MDBS design 
considerations. MDBS seeks to minimize the potential of the 
control processor to be a bottleneck by minimizing the 
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Computer 



c :> 



Figure 1 



Overview of MDBS 



controller functions 



Accordingly, MDBS is viewed in terms 



of controller functions and backend functions, as depicted 
in Figure 2. Each backend is responsible for conducting its 
own operations, including queueing and scheduling of re- 
quests. Identical operating software is maintained at each 
backend. Expansion of the system is accomplished by 
replicating this software in additional backends. 

The database is distributed across all the backends via 
the clustering mechanism, explained in the next section of 
this chapter. Requests from the controller are broadcasted 
to all the backends at the same time for processing. This 
allows for parallel processing of requests. Requests are 
queued at each backend. To permit continuous execution of 
requests each backend schedules request execution indepen- 
dently. The addition of more backends results in an increase 
in parallel processing of requests, which improves 
throughput and response time. 

In the next three sections, descriptions of the 
attribute-based data model, the functions of the controller 
and the functions of the backend are presented. 



A. THE ATTRIBUTE-BASED DATA MODEL 
The data model used in MDBS is 
developed by Hsiao and Harary [Ref 



the attribute-based model 
17], In their work they 
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Figure 2 



Functional Division of MDBS 



use the set A to represent attributes and the set V to 



represent values ♦ A record is then defined to be a subset 
of the Cartesian product A x V, where each attribute has one 
and only one value. This way the record , R, is a set of 
ordered pairs of the form (an attribute, its value). 

For each record R, a set of its attribute-value pairs 
which collectively characterize R is formed. These sets of a 
record collection form an index ♦ These ordered pairs in the 
index are called the keywords . The index is used to 
identify a record or a set of records. 

Following the keywords is the record body, which is a 
string of characters not used by MDBS for search purposes. 
An example of a record index without a following body is 
shown below. 

( <FILE, employee >, <NAME, Smi th> ,<CITY, Monte rey>, <RANK, 3>) 
The first attribute-value pair in all records of a file are 
the same, since it designates the file name. In the example 
above the file name is "employee". 

In order to enhance the performance of the system, 
records are logically grouped into clusters . The clustering 
is determined by the attribute values and attribute value 
ranges in the records. In the example above, records could 
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be clustered on the NAME attribute, with all employees 
having a last name starting with the letter 'S' clustered 
together . 

Keyword predicates are used in the data manipulation 
language for search and retrieval purposes. The keyword 
predicate has the form (attribute, relational operator, 
value). For example, 

(SALARY > 2000) 

is a simple greater-than predicate. A keyword is said to 
satisfy a predicate if the attribute of the keyword is 
identical to the attribute of the predicate and the relation 
specified by the relational operator of the predicate holds 
for the value of the keyword and the value of the predicate. 
For example, the keyword <RANK,4> satisfies the predicate 
(RANK < 6) . 

A conjunction is simply a conjunction of predicates, 
such as: 

(SALARY > 10000) A (RANK = 3) 

A record satisfies a query conjunction if the record 
contains keywords that satisfy every predicate in the 
conjunction. A query is a boolean expression of predicates, 
such as : 

((DEPT = Sales) A (SALARY < 10000)) V 
((DEPT = Sales) /\ (SALARY > 15000)) 
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B. IMPLEMENTATION OF THE ATTRIBUTE-BASED MODEL IN MDBS 

The indices of the attribute-based model are 
implemented in MDBS as descriptors . Descriptors are defined 
for designated directory attributes. The rules of 
definition require that the descriptors for a directory 
attribute form a partition over the domain of the attribute. 

Clusters result from the partitioning of the database 
according to the descriptor definitions. A record belongs 
to the cluster defined by the set of descriptors which can 
be derived from the keywords of the record. 

The clustering mechanism provides an ideal vehicle for 
distributing data across the backends of MDBS to take full 
advantage of parallelism. The records of a cluster are 
distributed track-at-a-time across all the backends. 
Therefore the work of query execution can be shared across 
the backends, with each backend processing the query against 
its portion of the relevant cluster (s). For a more detailed 
explanation of the clustering mechanism, readers are 
referred to [Ref. 1], 
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C. FUNCTIONS OF THE CONTROLLER 

It is important to reiterate that a basic design 

consideration of MDBS is to minimize the functions of the 
controller. These functions are divided into three 

categories: request preparation, insert information 

generation, and post processing. The request preparation 
functions comprise the necessary operations performed on a 
request prior to its broadcast to the backends. These 

functions include parsing and syntax checking. Insert 

information generation functions are performed during the 
processing of an insert request in order to supply 

additional information needed by the backends. Post 
processing functions are performed after replies are 

returned from the backends. For example, these functions 
perform housekeeping duties on the separate responses to the 
single broadcast request, that is, the collection of the 
data prior to transmission to the host machine. 

D. FUNCTIONS OF THE BACKEND 

Functions within each backend are divided into three 
categories: directory management, record processing, and 

I 

concurrency control. The directory management function is 
further divided into descriptor search, cluster search, 
address generation, and directory table maintenance. It is 
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responsible for searching through the descriptors and 
clusters to determine the disk addresses for the records to 
be accessed. The record processing functions include: record 
storage, record retrieval, record selection, and attribute 
value extraction of the retrieved records. Concurrency 
control is maintained by the locking of clusters to prevent 
conflicting access to the same clustered data. 

Figure 3 is a representation of the operations performed 
on a user request. A request is submitted to the host, 
which converts it to the internal form of the MDBS 
environment. The controller parses the request and checks 
for syntax errors, then broadcasts the request to all of the 
backends. The work of descriptor search is shared among all 
backends. Each backend does its portion of descriptor 
search, and broadcasts its findings to all the other 
backends. When all descriptors have been identified, each 
backend independently performs cluster search. The 
appropriate records are then selected, values extracted and 
the results sent back to the controller. The controller 
collects the results from all the backends, performs any 
final aggregation required, and forwards the data to the 
host . 
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Results 



iaure 3 



Request Flow in MDBS 
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THE MDBS QUERY LANGUAGE 



The query language for MDBS is a non-procedural language 
in which queries are expressed in the disjunctive normal 
form. The language itself supports four different types of 
requests: retrieve, insert, delete, and update. Appendix A 

is a formal specification of the language. In the examples 
below, reserved words are capitalized and optional portions 
of queries are enclosed in brackets. 

A. THE RETRIEVE REQUEST 

The RETRIEVE is the most flexible of the operations on 
the database. It is the user's vehicle to query the data- 
base for information. Unlike the other three operations, 
the retrieve does not alter the contents of the database. 
Its syntax is: 

RETRIEVE Query Target list [BY attribute] [WITH pointer]. 



As shown above, 


the 


RETRIEVE request 


consists of five parts. 


The last 


two 


parts 


, those enclosed 


in 


square brackets 


, are 


optional. 


The 


operator RETRIEVE is 


a 


reserved word 


and 


indicates 


the 


type 


of request. 


The 


query is made 


up of 


predicates 


in 


the 


disjunctive normal 


form. The 


query 



defines the portion of the database which is to be 
retrieved. The target list is the list of attributes for 
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which values are to be extracted from the records which 
satisfy the query. The attribute value may be a value from 
the record, or an aggregate of values from multiple records. 
Five aggregate operators are supported in MDBS: AVG , COUNT, 
MAX, MIN, and SUM. 

The BY^ clause performs an ordering on the data returned. 
For example, to RETRIEVE all the employees names ordered 
according to department, the following query can be used. 



RETRIEVE (FILE = Employee ) <NAME> BY DEPT 



The WITH clause specifies whether pointers to the 
retrieved records must be returned to the user for later use 
in an update request. 

Let us examine some examples of RETRIEVE requests. 
Example 1. Retrieve the names of all employees who make 
less than $10,000. 

RETRIEVE ( (FILE = Employee) A (SALARY < 1 0000 ) ) <NAME> 
Example 2. Retrieve the average salary of employees who 
have a rank greater than 2, order by depart- 
ment. 



RETRIEVE ( (FILE = Employee) A (RANK > 2)) 
<AVG (SALARY) > BY DEPT 
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B. THE INSERT REQUEST 

The INSERT request is used to add records to the data- 
base. The syntax is: 

INSERT Record 

where record is the record to be inserted into the database. 
An example of an INSERT request is: 

INSERT ( <FILE , Employee > , <NAME , Smi th> , < SALARY , 10000> ) 
This creates a record in the employee file for Smith and 
sets his salary at 10000. 

C. THE DELETE REQUEST 

The DELETE request is used to remove records from the 
database. The syntax is: 

DELETE Query 

where query is of the same form as that used in the RETRIEVE 
request. An example is: 

DELETE ((FILE = Employee) A (NAME = Smith)) 

This deletes all records in the employee file for employees 
named Smith. 

D. THE UPDATE REQUEST 

The UPDATE request is used to modify values for records 
which already exist in the database. The syntax is: 

UPDATE Query Modifier 

where the query specifies the particular records to be 
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modified and modi f ier indicates the type of modification 



that is to be performed. MDBS allows five types of 
modifications . 

The TYPE -0 modification sets the new value of the attri- 
bute being modified to a constant. An example of a TYPE-0 
modification is: 

UPDATE ((FILE = Employee) /\ (NAME = Smith)) 

< SALARY = 5000> 

This sets the salary of all employees named Smith to 5000. 

In the TYPE -I modification , the new value of the attri- 
bute is set to some function of the old value of the attri- 
bute in the record being modified. An example of a TYPE-I 
modification is: 

UPDATE ((FILE = Employee) A (NAME = Smith)) 

<SALARY = 2 * SALARY> 

This doubles the salary of all employees named Smith. 

The TYPE - I I mod i f icat ion sets the new value of the at- 
tribute to some function of another attribute contained 
within the same record. Where a TYPE-I modification was a 
function of the same attribute, the TYPE-II modification 
looks at another attribute to derive a value. An example of 
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a TYPE-II modification is: 



UPDATE ((FILE = Employee) A (NAME = Smith)) 

<SALARY = 8 * RANK> 

This makes all the salaries of employees with the last name 
Smith equal to eight times the value of their rank. 

The TYPE - II I modifier derives the new value of the at- 
tribute being modified from some function of another attri- 
bute value contained in another record which is identified 
by the query in the modifier. An example of a TYPE-III 
modification is: 

UPDATE ((FILE = Employee) A (NAME = Smith) ) <SALARY = 
SALARY of (FILE = Positions) A (JOB = Manager) > 
Here employees named Smith get their salary set to that of a 
manager's, as recorded in the Positions file. 

The TYPE - IV modifier derives the new value of the attri- 
bute being modified from a function of another attribute 
value in another record identified by the pointer in the 
modifier. This requires a retrieval request first in order 
to obtain the value for the pointer. An example of a 
TYPE-IV modification is: 

RETRIEVE ( (FILE = Employee) A (NAME = Jones)) with Pointer 
The retrieve request returns the value of a pointer, in this 
example, say, 2000. So we can then execute the following 
update request. 
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IV. THE RELATIONAL QUERY LANGUAGE, SQL 



Data in the relational data model is depicted as a two- 
dimensional table. The relational query language, SQL 
attempts to exploit this representation. It does this by 
providing an English-like language that allows the user to 
list the attributes from a relation meeting the user's 
selection requirements. For a more complete description, 
the reader is referred to [Ref. 3 and 16] . 

Various implementations of SQL provide many functions 
and facilities beyond the basic SQL. The four basic con- 
structs are: select, insert, delete, and update. However, 
in illustrating the use of the basic constructs, we include 
some other functions and facilities. In particular, some of 
the examples and constructs shown below are those imple- 
mented by the Oracle Corporation [Ref. 18] database 
management system. 

A. THE SELECT REQUEST 

The SELECT request is used for retrieval of data from 
the database. Its general form is as follows. 

SELECT A , . . . , A 

FROM R 

WHERE B b AND . . . AND B <$ b 
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where A and B are attributes found in the relation R, is a 
relational operator (such as >, <, = , = , >, < ),, and b is a 
constant. In particular, B Q b is termed a predicate. 
Within the general guidelines above, SQL offers a great deal 
of latitude in the formation of the SELECT query. Let us 
look at each clause separately, the SELECT clause, the FROM 
line, and the WHERE line. 

Instead of listing the attributes to be retrieved on the 
SELECT clause the user may request the return of the entire 
relation by using the wild card character, SQL also 
allows for the use of aggregate operators (such as AVG, SUM, 
MAX), arithmetic operators (such as +,-,/) , and arith- 
metic functions (such as ROUND, TRUNC) . Additionally, SQL 
permits the user to define the format for the retrieved 



data. These are only some of the basic variations 
permitted. Examples of these options follow: 

Example 1. Retrieve all the attributes for all the 
employees. (Use of the wildcard.) 



SELECT * 

FROM Employee 
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Example 2 



Example 3 



Example 4 



Example 5 



Obtain the average salary of all the 
employees. (Use of an aggregate operator.) 

SELECT AVG (Salary) 

FROM Employee 

Obtain the total of the salary and com- 
mission for each employee. (Use of an 
arithmetic operator.) 

SELECT Salary + Commission 
FROM Employee 

Retrieve the salaries of all the employees, 
rounded to the nearest dollar. (Use of an 
arithmetic function.) 

SELECT ROUND (Salary) 

FROM Employee 

Retrieve the dates of hiring for all the em- 
ployees, and format them to read 
month/day/year (ex. 09/24/50) . 

(Use of a format option.) 
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SELECT TO_CHAR (Hi redate , 'MM/DD/YY' ) Hiredate 
FROM Employee 

The FROM line identifies the relation or relations from 
which data is to be retrieved. A single relation is speci- 
fied for simple retrievals. Two or more relations are 
specified for join operations. 

An example of a simple SELECT on a single relation is as 
follows . 

Example 6. Return the names of all the employees. 

SELECT Name 
FROM Employee 

An example of a join, involving two relations in this case, 
is as follows. 

Example 7. Return all the names and locations of the de- 
partments which have an employee named Smith. 
SELECT Name, Location 
FROM Employee, Department 
WHERE Name = Smith 

The WHERE line establishes the conditions on which the 
retrieval is to be made. Predicates are used to qualify the 
selection of tuples from the relations (s) . Only those 
tuples which satisfy the predicates are selected. Like the 



36 



SELECT line it has many variations. These variations in- 
clude: an attribute of the relation compared to some con- 
stant, the testing of an attribute for set membership, the 
use of boolean operators to create complex conditions, and 
the ability to nest additional SELECT clauses in order to 
extract values for comparison. The following are examples 
of some of these variations. 

Example 8. Retrieve the names and salaries of all the 

employees that have a salary equal to 10000. 
(Comparison of an attribute to a constant.) 

SELECT Name, Salary 
FROM Employee 
WHERE Salary = 10000 

Example 9. Obtain the names of the employees whose jobs 
are either a clerks, analysts, or managers. 

(A test for set inclusion.) 

SELECT Name 
FROM Employee 

WHERE Job IN (Clerk, Analyst, Manager) 
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Example 10. List the names of all the employees that 

have a salary equal to 10000 and are named 
Smith. (A logical AND operation.) 

SELECT Name 
FROM Employee 

WHERE Salary = 10000 AND Name = Smith 

Example 11. List the name and job of employees who have 
the same job as Smith. (A nested SELECT.) 

SELECT Name, Job 
FROM Employee 
WHERE Job = 

(SELECT Job 
FROM Employee 
WHERE Name = Smith) 

B. THE INSERT REQUEST 

The INSERT request is used to create rows (tuples) in a 
relation (table) and has the general form: 

INSERT INTO R 
VALUES (V ,... f V) 

where R is the relation name and V is a value. The order 
in which the data values are listed in the INSERT must 
correspond to the order of the columns in the table. An 
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example of an INSERT is as follows. 

INSERT INTO Employee 
VALUES (Smith, 2,10000) 

This example creates a new tuple within the employee rela- 
tion. Assuming that the Employee relation has attributes 
name, rank, and salary, a new tuple is created with the name 
being Smith, the rank being 2, and the salary being 10000. 

C. THE DELETE REQUEST 

The DELETE removes a row (tuple) or rows (tuples) from a 
table (relation). It has the general form: 

DELETE FROM R 

WHERE B b , . . . ,B $ b 

where R is the name of the relation, B is an attribute of 
the relation, (|) is a relational operator, and b is a con- 
stant. The WHERE clause for the DELETE has the same options 
available that are in the SELECT. An example of a DELETE is 
as follows. 

DELETE FROM Employee 
WHERE Name = Smith 

This deletes all rows from the Employee table where the name 
is equal to Smith. 
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D. THE UPDATE REQUEST 

The UPDATE command changes the attribute values stored 
in the database. It has the general form: 

UPDATE R 

SETA a , . . . , A a 

WHERE B(j) b , . . . ,B dj b 

where R is the relation name, A is the attribute to be as- 
signed a new value, a . The WHERE clause has options as pre- 
viously discussed. An example of an UPDATE request is as 
follows . 

UPDATE Employee 
SET Salary = 20000 
WHERE Name = Smith 

This update results in all employees named Smith having 
their salaries set at 20000. 
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V 



THE MAPPINGS FROM SQL TO THE MDBS QUERY LANGUAGE 



The idea of a SQL interface to MDBS is to provide to the 
user a friendly interface. SQL was chosen as the query 
language because of its English-like syntax and the 
existence of a wide-spread community of SQL users. We must 
emphasize here that we are implementing an interface between 
the SQL users and MDBS. We are not adding functionality to 

i 

MDBS. 

The distinction between implementing an interface and 
adding functionality is important for the following reason. 
SQL is a relational query language. The primary operations 
of SQL are SELECT, UPDATE, INSERT, and DELETE. The special 
relational operations, projection and join, are included in 
SQL, as well as aggregate operations, ordering, and various 
set operations. SQL is usually supported by a relational 
database management system which implements all of these 
relational operations. 

MDBS, however, is not based on the relational model. 
The data model of the MDBS machine is the attribute-based 
model. The attribute-based model is flexible, and can sup- 
port relational data structures: relations, tuples, and at- 
tributes [Ref. 17], However, the functionality of MDBS does 
not encompass all relational operations. The four primary 
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operations of the MDBS machine are RETRIEVE, UPDATE, INSERT, 
and DELETE. The aggregate operations are also supported. 
MDBS does not support the join and ordering operations. Nor 
does it support set operations. 

From the discussion above, it is clear that the set of 
SQL operations to be included in our interface will be lim- 
ited to those supported by the functionality of MDBS. The 
subset of SQL operations which can be- supported by MDBS 
directly is formally specified in Appendix B. In the 
remainder of this chapter, we define the mappings from the 
subset of SQL which can be supported directly by the primary 
operations of MDBS. We present the mappings both in graph- 
ics and in text. In the next section we explain the graphic 
notations. The remaining sections of this chapter give the 
details of the mappings from SQL to the MDBS query language. 

A. GRAPHIC NOTATION 

We will show the mappings graphically, and also explain 
them in text. The graphic notation is illustrated in Figure 
4. The general forms of the SQL and MDBS queries used here 
have been developed in Chapters IV and V. The mappings are 
represented by directional arrows, and symbols indicating 
the type of the mapping. We have identified two types of 
mappings: syntactic substitution and conversion. 
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Syntactic - substitution mappings require only simple sub- 
stitution of syntactical terms. The symbol for this type of 
mapping is a square marked with the letter S. Figure 4 
shows two examples of a syntactic substitution mapping. The 
first example maps the SQL SELECT term to the MDBS RETRIEVE 
term. The second example maps the SQL sel_expr_l ist to the 
MDBS target_list. This example illustrates that a syntactic 
substitution may be a direct copy of clauses from the SQL 
query to the MDBS query. 

Conversion mappings combine a clause from a SQL query 
with information about the MDBS data structure to derive the 
clause of the MDBS query. The symbol for conversion mapping 
is a triangle marked with the letter C. In Figure 4, the 
mapping of the FROM and WHERE clauses of the SQL query into 
the query clause of the MDBS request is a conversion 
mapping . 

In Section B, we present the overall structure of the 
mappings from SQL queries to MDBS queries. In Sections C, 
D, and E, we discuss individually the three conversion 
mappings identified in Section B. 



B. MAPPING REQUESTS FROM SQL TO MDBS 

In this section, we show the syntactic-substitution 
mappings for the general forms of the SQL SELECT, UPDATE, 
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RETRIEVE 



sel-expr.l 1st 




[BY attricute] 



friqure 4. tapping tne SSL SELECT to the md&S RETRIEVE 




INSERT, and DELETE requests into the MDBS RETRIEVE, UPDATE, 
INSERT, and DELETE requests, respectively. The conversion 
mappings are explained in detail in subsequent sections. 

1. Mapping the SQL SELECT into the MDBS RETRIEVE 

The first mapping is the SQL SELECT request to the 
MDBS RETRIEVE request. The SELECT query has the general 
form: 

SELECT sel_expr_l ist FROM table_name 
[WHERE boolean] 

[GROUP BY field_name] 

The RETRIEVE request has the general form: 

RETRIEVE query target_list 
[BY attribute] 

The SELECT to RETRIEVE mapping has been shown in Figure 4. 
The reserved word RETRIEVE is substituted for the reserved 
word SELECT. The sel_expr_l ist is a list of attributes that 
the user wishes to access from the database, and directly 
corresponds to the MDBS target_list. Consequently, it can 
simply be copied into the MDBS request. The "FROM table_name 
[WHERE boolean]" portion of the SQL request requires a 
conversion mapping into the "query" portion of the MDBS 
language. This conversion will be discussed in Section C. 
The reserved words 'GROUP BY' of SQL are directly translated 
into the MDBS reserved word, BY. The attribute upon which 
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the grouping is to take place is copied from the SQL query 
to the MDBS request. 

2. Mapping the SQL INSERT into the MDBS INSERT 

Figure 5 illustrates the mapping required for the in- 
sert requests. The general form for the SQL INSERT request 
is : 

INSERT INTO table_name VALUES insert_spec 
The MDBS INSERT request's form is: 

INSERT record. 

The reserved word INSERT is the same for the two requests. 
The remaining portion of the SQL request, 'INTO table_name 
VALUES insert_spec ' , requires a conversion mapping into the 
record portion of the MDBS query. This conversion will be 
explained in Section D. 
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Figure 5. yapping the SCL INSERT to the y.DES INSERT, 
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3. Mapping the SQL DELETE into the MDBS DELETE 



The mapping for the delete requests is shown in Fig- 
ure 6. The delete request in SQL has the general form: 
DELETE FROM table_name [WHERE boolean] 

While in MBDS the general form is: 

DELETE query. 

The reserved word DELETE is common to both requests. The 
conversion of the "FROM table_name [WHERE boolean] " portion 
of the SQL request into the "query" portion of the MDBS re- 
quest is the same as that required in the SELECT request, 
and will be discussed in Section C. 

4. Mapping the SQL UPDATE into the MDBS UPDATE 

Figure 7 depicts the mapping for the update request. 
The general form for the update request in SQL is: 

UPDATE table_name se t_clause_l ist 
[WHERE boolean] 

In MDBS the form is: 

UPDATE query modifier. 

The SQL reserved word UPDATE is simply copied into MDBS. 
The "table_name [WHERE boolean]" conversion mapping is like 
that used in the SELECT and DELETE requests and will be ex- 
plained in Section C. The se t_clause_l ist of SQL requires a 
conversion mapping in order to match to the modifier 
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Figure 7. .Mapping the S&L UPDATE to the MDBS UPDATE 




portion of the MDBS request. That mapping is explained in 
section E. 

C. THE CONVERSION MAPPING TO THE MDBS QUERY 

The select, delete and update requests of SQL all have a 
"FROM table_name [WHERE boolean]" portion. In the update 
request it varies slightly in that the reserved word FROM is 
not used. However, the conversion required is essentially 
the same. This portion of the SQL request maps into the 
"query" portion of the MDBS retrieve, delete and update re- 
quests. However, due to the variety of forms and constructs 
available in SQL, a conversion is required to reconstruct 
this portion into an acceptable MDBS format. 

As illustrated in Figure 8, much of the conversion re- 
quires only a simple mapping. The specification of the MDBS 
query requires that the first attribute-value relationship 
be "FILE = attribute", where the attribute is the name of a 
file. This is equivalent to the SQL, "FROM table_name". 

In addition MDBS requires that queries be composed in 
the disjunctive normal form. SQL does not have this res- 
triction. This is demonstrated in the examples below, where 
the SQL "[WHERE boolean]" clause is mapped into a 
disjunction of conjunctions in the MDBS request. To explain 
the conversions required to convert SQL's "boolean" into an 
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acceptable MDBS "query", examples will be used. In 
case a SQL request will be shown, followed by 
corresponding request in MDBS. 

Example 1. Obtain the names of the employees that 
have a salary of 10000 and are clerks. 



SELECT Name 

FROM Emp 

WHERE Sal = 10000 AND Job = Clerk 



RETRIEVE 

( (File = Emp) A 

(Sal = 10000) A ( Job = Clerk)) 
<Name> 



each 

the 
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Example 2. Obtain the names of employees who have 



a salary between 5000 and 10000. 



SELECT Name 

FROM Emp 



WHERE Sal BETWEEN 5000 AND 10000 



RETRIEVE 



((File = Emp) A 

((Sal >= 5000) A (Sal <= 10000))) 
<Name> 
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Example 3. Obtain the names of employees who are 



employeed as a clerk, analyst or manager. 



SELECT Name 

FROM Emp 

WHERE Job IN (Clerk, Analyst, Manager) 



RETRIEVE 



(((File = Emp) A (J°b = Clerk)) V 

((File = Emp) A (Job = Analyst)) V 

((File = Emp) /\ (Job = Manager))) 

<Name> 



As seen in example 3 above, the reconstruction of 
request into acceptable MDBS disjunctive normal 
quires the identification of the file attribute 
predicate . 



the SQL 
form re- 
in each 



55 



D. THE CONVERSION MAPPING TO THE MDBS RECORD 



SQL's insert request uses "INTO table_name VALUES 
insert_spec" to identify the relation and attribute values 
that are to be inserted as a record. This corresponds to 
the "record" portion of the MDBS insert request. The MDBS 
record is a series of attribute-value pairs. The first pair 
is the file name (ex. <File,Emp>). This corresponds to 
SQL's "INTO table_name" which identifies the relation name. 
Figure 9 illustrates this mapping. 

The " insert_spec" portion of the SQL insert request is a 
listing of the values to be inserted in the relation. The 
ordering of the values must be identical to the ordering of 
the attributes in the relation, and all attributes must have 
an assigned value. MDBS, on the other hand, represents a 
record as a list of attribute-value pairs. There is no re- 
quirement for ordering of the attr ibute-value pairs, as 
values are matched with attributes. Nor does MDBS require 
that values be assigned to all attributes. Instead MDBS as- 
signs default values of zeros or spaces for integer and 
character attribute types. 

In order to implement the SQL insert request, the MDBS 
record template information will have to be made available 
to the interface. The attribute names in the record 
template are ordered. The attribute names in the template 
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can then be matched to the values listed in the SQL insert 
request to form the attribute-value pairs of the MDBS 
record. For example, the following SQL insert request 
INSERT INTO Emp VALUES Smith, Clerk, 10000 
would be converted to read 

INSERT (<File , Emp> , <Name , Smi th> , 

<Job,Clerk>, <Sal , 10000>) . 

Alternatives for implementing this conversion will be 

further discussed in Chapter 6. 

E. THE CONVERSION MAPPING INTO THE MDBS MODIFIER 

The "set_clause_l ist " of the SQL update request has a 
direct correlation to the "modifier" of the MDBS update re- 
quest. Figure 10 illustrates this mapping. SQL has 

constructs to represent the first four types of MDBS modif- 
iers. The TYPE-0 modification sets the new value of the at- 
tribute being modified to a constant. The TYPE-I modifica- 
tion obtains the new value of an attribute being modified by 
setting it. to some function of the old value. The TYPE-II 
modification sets the value of the attribute being modified 
to some function of another attribute contained within the 
same record. The TYPE-III modifier derives the value of the 
attribute being modified from some function of an attribute 
contained within another record. SQL has no construct which 
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Figure 10. Meeting tc the m D d s modifier 



new 



corresponds to a TYPE-IV modification, which derives the 
value of the attribute being modified from a function of 
another attribute value in another record identified by the 
pointer in the modifier. 

SQL offers a wide variety of constructs for its "expr" 
in the set_clause_l ist . In the examples below, we illustrate 
the correspondence between these constructs and the MDBS 
modifiers. The conversion required is a reordering or 
rewriting of these constructs into acceptable MDBS format. 
The conversion is much like that used in the query mapping 
of Section B. The following examples illustrate the conver- 
sions that are required. For simplicity, the examples are 
singular updates. The SQL request will be presented first, 
followed by the corresponding MDBS request. 
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Example 4. Set the salary of all employees named 



UPDATE 



UPDATE 



Smith to 10000. 

(Ex. MDBS Type-0 modification) 



Emp 

SET Salary = 10000 

WHERE Name = Smith 



((File = Emp) A 
(Name = Smith)) 

<Salary = 10000> 
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Example 5. Double the salary of all employees 



UPDATE 



UPDATE 



named Smith. 

(Ex. MDBS Type-1 modification) 



Emp 

SET Salary = 2 * Salary 

WHERE Name = Smith 



((File = Emp) A 
(Name = Smith)) 

<Salary = 2 * Salary> 
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Example 6. Set the salary of all employees 



UPDATE 



UPDATE 



named Smith to eight times the value 
of their rank. 

(Ex. MDBS Type-II modification) 



Emp 

SET Salary = 8 * Rank 

WHERE Name = Smith 



( (File = Emp) A 
(Name = Smith ) ) 

<Salary = 8 * Rank> 
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Example 7. Set the salaries of the employees 



UPDATE 



UPDATE 



named Smith to that of a manager's, 
as recorded in the Positions file. 
(Ex. MDBS Type-Ill modification) 



Emp 

SET Salary = (SELECT Salary 
FROM Positions 
WHERE Job = Manager) 
WHERE Name = Smith 



((File = Emp) A 
(Name = Smith ) ) 

<Salary = Salary of (File = Positions) A 
(Job = Manager ) > 
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VI. RECOMMENDATIONS FOR IMPLEMENTATION 



In Chapter V we demonstrated that a SQL-to-MDBS query 
language interface could be constructed for a subset of SQL. 
In this chapter, we will discuss the implementation issues. 
The first section deals with two areas of differences 
between the constructs of SQL and the MDBS query language. 
One is the MDBS requirement that queries be constructed in 
the disjunctive normal form. The other is the difference in 
the construct of the insert requests, as addressed in 
Chapter V, Section D. 

In the second section we give suggestions for expanding 
the capabilities of the SQL/MDBS interface to support some 
SQL constructs that MDBS does not directly support. These 
are constructs which can be mapped from a single SQL request 
into a series of MDBS requests. The third section discusses 
extending MDBS to support the join and sort operations. The 
last section of this chapter discusses the use of program 
development tools to aid in the actual implementation. 

A. SQL AND MDBS DIFFERENCES 

In order to effectively support the SQL-to-MDBS inter- 
face, two differences in construct between the two languages 
have to be resolved. The first is the MDBS requirement that 
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all queries be written in the disjunctive normal form. The 
second is the form of the SQL insert request as compared to 
the MDBS insert request. 

1 . The Disjunctive Normal Form 

MDBS requires the query portion of the retrieve, 
delete, and update requests to be written in disjunctive 
normal form. On the other hand most commercial versions of 
SQL do not place this restraint on the user. In order to 
support this capability, the interface will be required to 
translate the free-form logical SQL statements into the dis- 
junctive normal form. In the very simple cases this is not 
an extraordinary burden. However, in any involved query the 
cost of translation could be expensive. For this reason, 
and to simplify construction of the interface, we believe 
that the user should be required to formulate his requests 
in the disjunctive normal form. This should not place a 
burden on the user since typically most requests are of a 
simple construction. 

2. Differences in the Insert Request 

The syntax of the insert request in SQL places a bur- 
den on the user to know the construction of the table into 
which he/she wishes to insert values. Each field must have 
an assigned value, and values must be listed in the order of 
the field names in the table definition. MDBS, on the other 
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hand, specifies the record to be inserted as a list of 
attribute-value pairs. The attribute-value pair is a direct 
assignment of value to the indicated attribute. There is no 
constraint on the ordering of the pairs. 

We recommend an enhancement for the SQL language inter- 
face, a new syntax for the insert request. The revised gen- 
eral form would be 

INSERT INTO table_name VALUES i nser t_val ues . 

The syntax for insert_values would be 

insert_val ues := (f ield_name , insert_spec) 

I i nsert_values , (f ield_name, insert_spec) 
This change in syntax brings the SQL insert command into 
line with the attribute-value pair syntax of the MDBS query 
language. More importantly it is believed that this change 
will improve user-friendliness. 

B. EXPANDING THE FUNCTIONALITY OF THE INTERFACE 

There are some basic SQL constructs which, while not 
directly supported by MDBS, can be mapped into a series of 
MDBS requests. The most important of these is the nested 
select construct. Additionally, commercial implementations 
of SQL offer a variety of features for manipulating and 
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editing data in result relations. We discuss below how the 
SQL-to-MDBS interface can be extended to provide these 
features . 

1 . Implicit Joins Through Nested Selects 

SQL has the capability to nest select requests, as 
discussed in Chapter IV. MDBS does not have this capabil- 
ity. The SQL syntax dictates that the innermost nested 
select be evaluated first. Evaluation then proceeds out- 
ward. Translated into MDBS, this requires a succession of 
retrieve statements. The innermost select statement 
corresponds to the first retrieve request. The following is 
an example of a SQL request with a nested select, and a 
series of MDBS retrieve requests that obtain the same 
results . 
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Example. Obtain the names of the employees who have 
a salary equal to that of a manager. 



SELECT Name 
FROM Emp 

WHERE Sal = (SELECT Sal 

FROM Payroll 
WHERE Job = Manager) 



RETRIEVE ((File 
RETRIEVE ((File 



Payroll) /\ (Job = Manager)) <Sal> 
Emp) /\ (Sal = Sal)) <Name> 



In the above example the Sal value obtained in the first 
MDBS request would be used as the Sal value in the second 
retrieve in order to obtain the Name. 

In order to implement this capability in the interface 
we recommend that a pre-preprocessor be written that 
exclusively looks for nested selects. The pre-preprocessor 
finds the innermost select and sends it to the preprocessor. 
The value (s) obtained from the operation would then be 
inserted into the query portion of the next level select. 
This outward operation would continue until the entire 
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request had been executed. Utilization of the 
pre-preprocessor allows the preprocessor to operate on 
single select requests. 

2. Formatting Options 

SQL gives the user some options in the formatting of 
his/her output within the context of the select request. 
This includes creating new headings, indentations, and pro- 
ducing columnar/tabular outputs. The following example 
changes the column name based on the Sal attribute to a more 
readable heading, "Salary". 

SELECT Name, Sal Salary 

FROM Emp 

WHERE Name = Smith 

In order to make MDBS more user-friendly and useful in the 
area of report generation and formatting, we recommend that 
a post-processor be implemented as part of the interface. 
The post-processor would be responsible for performing the 
format and output options. 

3. Arithmetic Operations and Functions 

SQL affords the user the ability to specify arithmet- 
ic operations and functions on the values of the result re- 
lation. For example, the following select request creates a 
new column in the output called 'comm/sal' , which is derived 



70 



from two existing attributes by dividing the comm attribute 
by the sal attribute. 

SELECT Name, Comm/Sal, Comm, Sal 
FROM Emp 

WHERE Job = Salesman 

The following is an example of a arithmetic function option 
in SQL. 

SELECT Name, ROUND (Sal, 2) 

FROM Emp 

WHERE Job = Salesman 

This example rounds the value of Sal to two decimal places. 
These and similar operations, can be implemented in a 
post-processor . 

C. JOIN AND SORT OPERATIONS 

SQL and the relational data model support join and sort 
operations. Currently MDBS does not support either. Imple- 
mentation of the nested select, as discussed in the previous 
section, would enable MDBS to support implicit joins, i.e., 
nested select requests. As MDBS is still in development, 
further research is required into the feasibility and 
desirability of implementing these operations on MDBS. The 
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consideratons of costs versus the additional capability that 
would be provided by such an implementation is outside the 
scope of this paper. However, if implemented, the effort to 
include the required SQL-to-MDBS translation in the 
interface would be minimal. 



D. TOOLS FOR ACTUAL IMPLEMENTATION 

. We recommend that the actual implementation of the in- 
terface be done using Yacc [Ref. 19] and Lex [Ref. 20], pro- 
gramming tools developed at Bell Laboratories. They can be 
used to produce an interpreter which accepts SQL requests 
and outputs the translated MDBS request. 

Lex is a lexical analyzer generator, designed for lexi- 
cal processing of character input streams. The user sup- 
plies the specifications for character string matching. Lex 
then produces a program in the programming language C, which 
recognizes regular expressions. Lex is generally used with 
Yacc to recognize and supply tokens. 

Yacc, an acronym for Yet Another Compiler-Compiler, is a 
general tool for imposing structure on the input to a com- 
puter program. The user prepares a specification of the in- 
put process, i.e., rules and actions. Yacc then generates a 
program of functions to control the input process. Yacc 
calls the lexical analyzer (Lex) to supply tokens, and then 
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parses the supplied tokens according to the production 
rules. 

Utilization of Yacc and Lex has several advantages. 
First, the MDBS query parser was created using these tools. 
Therefore in the event of any required scanner/parser to 
scanner/parser communications, both would be operating in 
similar environments. Second, both are written in C, which 
improves their transportability. Third, Yacc and Lex are 
both well documented and are relatively easy to use. 
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VII 



CONCLUSION 



MDBS uses the attribute-based data model. Records are 
composed of ordered pairs of the form (an attribute, its 
value). Descriptors, or indices, are defined for selected 
directory attributes. These descriptors are used to parti- 
tion the database into clusters. The clusters are distri- 
buted across the backends to take full advantage of parallel 
execution of requests. 

The MDBS user accesses the database using a simple, 
non-procedural query language. The language supports four 
different types of requests: retrieve, insert, delete, and 
update. The retrieve query is used to access, but not 
alter, the contents of the database. The insert and delete 
requests are used to add or remove records in the database. 
The update request modifies existing records of the 
da tabase . 

SQL is a relational query language, designed for use 
with relational databases. Like the MDBS query language, it 
has four different types of requests: select, insert, 
delete, and update. Like the MDBS retrieve request, the 
select accesses, but does not alter, the contents of the da- 
tabase. The SQL insert, delete, and update requests perform 
operations similar to those of their MDBS counterparts. 
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However, unlike the MDBS query language, SQL offers a 
variety of options in the syntax of its requests. This 
variety enables SQL requests to be constructed with varying 
degrees of logical and syntactical complexity. 

In this thesis, we have identified the direct mappings 
from SQL queries into MDBS queries. These mappings can be 
directly supported in the SQL-to-MDBS interface. We have 
also identified those SQL constructs which have no direct 
mapping, but can be converted into a sequence of MDBS 
queries. Enhancements to the interface are proposed to sup- 
port these indirect mappings. Lastly, we have identified 
those SQL constructs for which no mapping exists. To sup- 
port these mappings, the functionality of MDBS must be aug- 
mented. Let us discuss each of these cases, identifying the 
contributions of this thesis and directions for further 
research . 

A. THE DIRECT MAPPINGS 

Some SQL queries can be directly mapped into MDBS re- 
quests. The retrieve, insert, delete, and update requests 
of MDBS have a direct functional correspondence to the SQL 
select, insert, delete, and update requests, respectively. 
There are three exceptions which require a degree of insight 
in order to perform a mapping. These are the mapping of the 
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"FROM table_name [WHERE boolean] " portion of 
"query" portion of MDBS, the mapping of 
table_name VALUES insert_spec" into the MDBS 
the mapping of the SQL "set_clause_l ist " 
"modifier" . 



SQL into the 
the SQL "INTO 
"record", and 
into the MDBS 



MDBS requires that the query portion of the request be 
written in disjunctive normal form. SQL, on the other hand, 
allows for free formatting of its logical constructs. To 
convert the SQL "FROM table_name [WHERE boolean] " construct 
into acceptable MDBS "query" format requires translating the 
options contained in the SQL "boolean" into MDBS disjunctive 
normal form. The complexity of this translation is 0(n**n), 
where n is the number of predicates in the boolean expres- 
sion. In order to limit the overhead of this translation, 
we recommend that the SQL- to-MDBS interface require the user 
to construct SQL qualifications in disjunctive normal form. 

The SQL insert request uses "INTO table_name VALUES 
insert_spec" to identify the relation and to list the values 
to be inserted. This list of values must correspond in ord- 
er and type to the constructed relation. MDBS, on the other 
hand, uses attribute-value pairs for insert parameters. One 
solution to this conversion is to have the SQL-to-MDBS 
interface provide to the user the MDBS record template for 
assignment of values. Another approach, which we recommend. 
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is to alter the syntax of 
correspond to the attri 
"record". This eliminates 
know the exact structure o 

SQL's "set_clause_list 
to identify the attribut 
update request. In additi 
date. With the excepti 
MDBS , which uses a pointer 
between the two languages 
quests. The only convers 
"se t_clause_l ist " into one 
types . 

These direct mappings 
thesis. Further research 
terface. For implementati 
preter could be const 
programming tools. 



SQL's insert request to make it 
bute-value pair syntax of the MDBS 
the requirement that the user 
f the relation definition. 

" and MDBS's "modifier" are used 
es to be changed as a result of an 
on, they specify the type of up- 
on of the TYPE-IV modification in 
, there is a direct correspondence 
in the syntax of their update re- 
ion required is formatting the 
of the acceptable MDBS "modifier" 

have been fully explained in this 
will involve implementing the in- 
on a lexical scanner and an inter- 
acted using the Yacc and Lex 
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B. ENHANCEMENTS TO SUPPORT FURTHER MAPPINGS 

There are several constructs which cannot be supported 
by direct mappings, but can be supported by an enhanced in- 
terface. The first of these is the implicit join operation, 
implemented in SQL by the nested SELECT. A re-preprocessor 
can be constructed to convert the nested SELECTS into a 
series of MDBS queries, and to control the iterative 
execution required. 

Several options are available in commercial versions of 
SQL which are not supported in MDBS, such as arithmetic 
operations and functions, and output formatting. In order 
to implement these features a post-processor could be con- 
structed. Further research will be required to design and 
analyze these pre- and post-processor functions. 

C. OPERATIONS FOR WHICH NO MAPPING EXISTS 

The SQL options that cannot be supported by MDBS are re- 
lated to the relational join operation and to the sorting 
capability commonly found in relational systems. MDBS, 
which is not a relational system but an attribute-based sys- 
tem, does not support either the join or the sort operation. 

In order to provide a f ully-f unctional relational inter- 
face to MDBS, some provision must be made to implement these 
operations. There are two choices. First, the join and 
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sort operations could be implemented in MDBS. Second, these 
operations could be preformed by additional software running 
on the host. Further research will be required to identify 
the costs and the tradeoffs of these two alternatives. 
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APPENDIX A: FORMAL SPECIFICATION OF DML FOR ATTRIBUTE-BASED 

LANGUAGE 



The following is the BNF for the attribute-based data 
manipulation language developed by Hsiao and Menon [Ref ] . 
Square brackets [ ] are used to indicate optional con- 
structs. 



Predicate 


• zz 


attribute rel_op value 


attribute 


: = 


char_string 


attr ibute_be ing_modi f ied 


: = 


attribute 


base_attr ibute 


: = 


attribute 


value 


i = 


string 
1 number 
1 float 


Conjunct 


• ~ 
« 


(Predicate) 

1 (Conjunct / Predicate) 


Query 




Conjunct 

1 Query / Conjunct 


Stat 




AVG | MAX | MIN | SUM | COUNT 


list el 


: s 


Stat (attribute) 


list 


• 


attribute 
I 1 i s t_e 1 
I list , attribute 
1 list,list_el 


Target_list 


: = 


(list) 


Attrib_val_pai r 


: = 


<attr ibute , value > 


Hal f_record 


• = 


Attrib_val_pair 
1 Half_record, Attrib_val_pai 


Record 


# 


(Half_record) 



80 



Pointer 


• — 
• 


number 


Modifier 


♦ — 
# 


type-0 
1 type-I 
1 type-II 
I type-III 
1 type-IV 


type-0 


• 


<attribute_being_modif ied 
value> 


type-I 


: = 


<attribute being modified 
exprl> 


type-II 


i — 


<attribute_being modified 
expr 2> 


type-I II 


• = 


<attribute_being_modif ied 
expr2 of Query> 


type-IV 


• “ 
♦ 


<att r i bute_be ing_mod i f ied 
expr2 of Pointer> 


Request 




Insert 
1 Delete 
1 Update 
1 Retrieve 


Insert 


: = 


INSERT Record 


Delete 


: = 


DELETE Query 


Update 


# “ 


UPDATE Query Modifier 


Retrieve 


• — 
• 


RETRIEVE Query Target list 
[BY Attribute] 

[WITH Pointer] 


uc letter 


# ~ 


A | B | C | . . . | Z 


string 


j = 


uc_le tte r 
1 string uc_letter 


lc letter 


: = 


a | b 1 c 1 . . . 1 z 
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cha r_str ing 


:= uc_letter 

I char_string lc_letter 


digit 


:= 0 | 1 | 2 | ... | 9 


number 


: = digit 

1 digit number 


float 


:= number . numbe r 


add op 


:= + | - 


mult op 


:= * 1 / 


exprl 


:= arith_terml 

1 exprl add-op arith terml 


ar i th_terml 


:= arith factorl 

1 arith terml mult op 
ari th_factorl 


arith factorl 


:= attribute being modified 
1 number 


expr 2 


:= arith_term2 

1 expr2 add_op arith_term2 


ar i th_term2 


:= ar i th_f actor 2 

1 arith term2 mult_op 
arTth factor2 


ar ith_factor2 


:= base_attr ibute 
I number 
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APPENDIX B: FORMAL SPECIFICATION OF DML FOR SQL MAPPING 



The following is the BNF for the SQL query language to 
MDBS query language mapping. Square brackets [ ] are -used 
to indicate optional constructs. 



dml_statement := 


selection; 

1 insertion; 
I deletion; 

I update; 


insertion := 


INSERT INTO table_name VALUES 
insert_spec 


insert spec := 


selection 
1 literal 


deletion := 


DELETE FROM table name 
[ where clause ] 


update := 


UPDATE table_name set_clause_l i st 
[ where_clause ] 


where_clause := 


WHERE boolean 


set_clause_l ist := 


set_clause 

| set clause_list , set clause 


set_clause := 


SET field_name = expr 
I SET field_name = ( selection ) 


selection := 


query_block 
I ( selection ) 


query_block := 


select clause FROM table name 
[ WHERE boolean ] 

[ GROUP BY attribute ] 


select clause := 


SELECT sel expr list 
| SELECT * 
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sel_expr_l ist 


• 


sel_expr 

1 sel_expr_l ist , sel_expr 


sel_expr 


: = 


field name 

1 stat ( field name ) 


boolean 


: = 


boolean_term 

| boolean OR boolean_term 


boolean_term 


: = 


boo lea n_f actor 






1 boolean_term AND boolean_f actor 


boo lean_f actor 


: = 


[ NOT ] boolean_pr imary 


predicate 


• 


attribute comparison value 
I attribute BETWEEN value AND val 
I attribute NOT BETWEEN value AND 
value 

1 attribute comparison table_spec 
I table_spec comparison 
f ul l_table_spec 


f ull_table_spec 


• * ' m 


table_spec 
I value 


table_spec 


• ~ 
• 


query_block 
I ( selection ) 
1 ( literal ) 


expr 


l = 


ar i th_term 

1 expr add_op arith_term 


ar ith_term 


J ss 


arith_factor 






I arith term mult op arith factor 


ar ith_factor 


; = 


primary 


primary 


j = 


value 
1 ( expr ) 


comparison 


• — m 


rel op 
I IN 

I NOT IN 


rel_op 


• * 


= 1 <> 1 < 1 > 1 <= 1 >= 


add_op 


• — 
• 


+ 1 - 
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mul t_op 


• — 
# 


* i / 




stat 




AVG | MAX | MIN | 


SUM | COUNT 


literal 


; s 


lit tuple_list 
1 lTt tuple 




1 i t_tuple_l ist 


: = 


lit_tuple 
1 1 i t_tuple_l ist 


, lit_tuple 


li t_tuple 


; = 


value 




table_name 


: = 


attribute 




f ield_name 


; = 


attribute 




attribute 


: = 


char_str ing 




value 


i = 


string 
1 number 
1 float 




char_str ing 


• 


uc_letter 
I char_string lc_ 


letter 


string 


• — 


uc_letter 
I string uc_letter 


uc_letter 


• — 
• 


A | B | C | . . . | 


Z 


lc letter 




a | b | c | . . . | 


z 


number 


• 


digit 

| number digit 




float 


• — 
• 


number . number 




digit 


• — 


0 | 1 | 2 | ... | 


9 
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